Combining multi-source far distance speech recognition strategies: beamforming, blind channel and confusion network combination
نویسندگان
چکیده
Interest within the automatic speech recognition (ASR) research community has recently focused on the recognition of speech captured with a microphone located in the medium field, rather than being mounted on a headset and positioned next to the speaker’s mouth. The capacity to recognize such speech is a primary requirement in making ASR a viable modality for socalled ubiquitous computing. This is a natural application for multiple microphones whose signals can be combined in different ways: On the signal side, combination can be accomplished by beamforming techniques using a microphone array or by blind source separation. On the word hypothesis side, combination can be achieved through confusion network combination. In this work, we compare the effectiveness of the several combination techniques, and compare their performance to that achieved with a close talking microphone.
منابع مشابه
Feature mapping using far-field microphones for distant speech recognition
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...
متن کاملAn Efficient Combination of Multi-channel Acoustic Echo Cancellation with a Beamforming Microphone Array
For hands-free man-machine audio interfaces with multi-channel sound reproduction and automatic speech recognition (ASR), both a multi-channel acoustic echo canceller (M-C AEC) and a beamforming microphone array are necessary for sufficient recognition rates. Based on known strategies for combining single-channel AEC and adaptive beamforming microphone arrays, we discuss special aspects for the...
متن کاملMulti-source far-distance microphone selection and combination for automatic transcription of lectures
In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel combination or at the hypothesis level u...
متن کاملCombined Multi-Channel NMF-Based Robust Beamforming for Noisy Speech Recognition
We propose a novel acoustic beamforming method using blind source separation (BSS) techniques based on non-negative matrix factorization (NMF). In conventional mask-based approaches, hard or soft masks are estimated and beamforming is performed using speech and noise spatial covariance matrices calculated from masked noisy observations, but the phase information of the target speech is not adeq...
متن کاملDistant-Talking Speech Recognition Based on Spectral Subtraction by Multi-Channel LMS Algorithm
We propose a blind dereverberation method based on spectral subtraction using a multi-channel least mean squares (MCLMS) algorithm for distant-talking speech recognition. In a distant-talking environment, the channel impulse response is longer than the short-term spectral analysis window. By treating the late reverberation as additive noise, a noise reduction technique based on spectral subtrac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005